Properties of the word set for estimating similarities between prokaryotic genomes in linguistic approach

نویسندگان

Keishin Hanya

Satoshi Mizuta

چکیده

Recently, as completely sequenced genomes have been rapidly increasing in number, comparison between whole genome sequences is becoming more important. Linguistic approach is one of the available methods to estimate the similarities between long sequences such as whole genomes[1]. In the method, a word set W is constructed, in which a word is defined as a sequence piece of four letters of nucleotides with fixed length, and the frequency of appearance of each word in W is calculated throughout a genome. The similarity between genomes is estimated by comparing the distributions of the frequencies of appearance calculated for the genomes based on such as the Kendall’s rank correlation. In performing the linguistic approach, we must predetermine three properties of W : the word length L, the size n, and the contents. In the previous study[2], we obtained the result, by analyzing the word diversities in prokaryotic genomes, that L = 8 ∼ 12 is appropriate for prokaryotic species. In this study, we investigate the other two properties, the size and the contents of W adequate for analyzing the similarity between prokaryotic genomes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering

Objectives: Stuttering is one of the most prevalent speech and language disorders. Symptomology of stuttering has been surveyed from different aspects such as biological, developmental, environmental, emotional, learning and linguistic. Previous researches in English-speaking people have suggested that some linguistic features such as word meanings may play a role in the frequency of speech non...

متن کامل

Arithmetic Aggregation Operators for Interval-valued Intuitionistic Linguistic Variables and Application to Multi-attribute Group Decision Making

The intuitionistic linguistic set (ILS) is an extension of linguisitc variable. To overcome the drawback of using single real number to represent membership degree and non-membership degree for ILS, the concept of interval-valued intuitionistic linguistic set (IVILS) is introduced through representing the membership degree and non-membership degree with intervals for ILS in this paper. The oper...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

A Comparison of Relationship between Text and Picture in the Selected Iranian and Contemporary American-European Illustrated-Fiction Books Based on the Theory of Maria Nikolajeva and Carole Scott

Illustrated-fiction books are special forms of art that are the combination of text and picture. The relationship between text and picture in this genre is diverse and variegated, and has different effects on the audience; however, little research has been done about it. The goal of this research is to compare text/picture relationship in the selected Iranian and contemporary American-European ...

متن کامل

Iranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations

The present investigation was designed to study the production and comprehension of specific means for information highlighted by advanced Iranian learners of English as a Foreign Language. The study focused on the discourse-pragmatically motivated variations of the basic word order such as inversion, pre-posing, it- and Wh-clefts. After taking the Nelson test, a homogeneous group was settled. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Properties of the word set for estimating similarities between prokaryotic genomes in linguistic approach

نویسندگان

چکیده

منابع مشابه

The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering

Arithmetic Aggregation Operators for Interval-valued Intuitionistic Linguistic Variables and Application to Multi-attribute Group Decision Making

A Hybrid Machine Translation System Based on a Monotone Decoder

A Comparison of Relationship between Text and Picture in the Selected Iranian and Contemporary American-European Illustrated-Fiction Books Based on the Theory of Maria Nikolajeva and Carole Scott

Iranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations

عنوان ژورنال:

اشتراک گذاری